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The  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  has  historically  been  validated  against  technical 
school  grades.  Job  performance  measurement  (JPM)  data  were  collected  in  the  1980s  to  link  ASVAB  cut  scores 
to  more  realistic  job  performance  measures  for  eight  Air  Force  Specialties  (AFSs).  Due  to  the  expense  of  these 
work  samples,  a  major  challenge  has  been  to  find  reliable  and  valid  surrogates.  For  261  Aerospace  Ground 
Equipment  Mechanics  (423x5)  first-termers,  data  on  performing  16  hands-on  tasks  were  collected.  These  tasks, 
selected  by  subject  matter  experts  (SMEs)  at  special  JPM  workshops,  ranged  from  having  seven  to  26  steps  and 
from  low  to  high  difficulty.  Three  data  sets  were  generated  for  each  task  —  members  getting  all,  some,  or  none  of 
the  task’s  steps  correct.  For  all  tasks,  the  largest  numbers  of  personnel  were  associated  with  getting  some  steps 
correct.  Generally,  less-difficult  tasks  were  associated  with  larger  numbers  of  personnel  getting  all  steps  correct. 

Getting  all  steps  correct  and  some  steps  correct  tended  to  be  close  in  average  task  performance  time,  as  well  as 
having  smaller  average  times,  than  did  getting  no  steps  correct.  A  correlation  of  .79  (g  <  .001)  was  found 
between  the  average  time  for  all  steps  correct  and  independent  SME  estimates  of  the  average  time  required  by 
first-termers  for  task  completion.  Members  with  all  steps  correct  tended  to  have  more  recent  task  experience, 
more  overall  task  experience,  more  average  task  performance  per  month,  higher  task  experience  ratings,  higher 
Armed  Forces  Qualification  Test  (AFQT)  scores,  and  higher  grades  in  technical  school.  Future  research  needs 
are  discussed. 
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Abstract 


Introduction 

The  United  States  Air  Force  (USAF)  has  historically  validated  the  g-loaded  (Ree  &  Earles,  1992)  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB)  against  grades  in  technical  school.  ASVAB  scores  are  used  as 
part  of  the  process  of  USAF  enlisted  selection  and  classification.  While  grades  in  technical  school  are  important, 
assuming  that  the  content  of  technical  school  training  and  testing  accurately  reflects  the  content  of  the  relevant 
Air  Force  Specialty  (AFS),  they  do  not  cover  the  entire  job  performance  criterion  space.  Because  of  this  lack  of 
coverage  and  a  problem  with  the  norming  of  the  ASVAB,  in  1980  the  Office  of  the  Secretary  of  Defense  started  a 
Joint-Service  program  to  research  the  Job  Performance  Measurement  (JPM)  of  enlisted  personnel.  The  resulting 
performance  measures  were  to  be  used  to  directly  link  enlistment  standards  to  job  performance.  JPM  data  were 
collected  for  eight  AFSs  over  a  period  of  three  years  in  the  mid  1980s,  and  have  been  analyzed  using  numerous 
methodologies.  The  current  study  analyzed  these  data  in  a  new  way  for  Aerospace  Ground  Equipment  (AGE) 
Mechanics  (AFS  423x5).  AGE  was  of  one  of  the  last  four  AFSs  that  data  were  collected  on.  The  current  analyses 
used  three  data  sets  for  each  task  —  members  getting  all,  some,  or  none  of  the  task’s  steps  correct.  Armed  Forces 
Qualification  Test  (AFQT)  scores,  final  technical  school  grades,  four  task  experience  measures,  and  task 
difficulty  were  examined  as  predictors  of  task  completion.  One  purpose  of  the  study  was  to  examine  how  well 
each  of  these  measures  predicted  membership  in  each  of  the  three  data  sets  for  each  task.  Successful  task 
completion  times  for  getting  all  of  the  task’s  steps  correct  were  also  compared  against  independent  expert 
estimates  of  the  average  task  completion  times  required  by  all  first-termers  for  successful  task  performance. 

Method 

The  examinees  were  261  USAF  AGE  enlisted  incumbents  (240  males,  21  females;  229  Whites,  20  Blacks,  12 
other)  in  their  first  four  years  of  enlistment  (Laue,  Bentley,  Bierstedt,  &  Molina,  1992).  The  mean  experience  for 
the  examinees  was  28.4  months.  A  work  sample  of  16  AGE  tasks  was  tested  using  hands-on  performance  testing 
to  assess  the  job  performance  of  the  incumbents  on  tasks  representative  of  their  Air  Force  Specialty  (AFS).  The 
16  tasks  were  chosen  partially  on  the  basis  of  job  analyses  of  their  frequency  of  performance  and  difficulty.  Prior 
to  task  selection  as  part  of  routine  occupational  inventory  surveys  by  the  Air  Force  Occupational  Measurement 
Squadron,  task  difficulties  were  provided  by  subject  matter  experts  (SMEs)  using  a  9-point  scale.  Higher  task 
difficulty  numbers  referred  to  more  difficult  tasks.  Another  set  of  SMEs  was  used  to  select  tasks  that  represented 
each  AFS  and  could  be  tested,  given  equipment  and  testing  time  constraints.  These  SMEs  also  defined  the  tasks 
in  terms  of  the  steps  involved  in  successful  task  completion  and  the  equipment  used.  For  purposes  of  task 
selection,  more  difficult  tasks  were  given  a  priority  compared  to  less  difficult  tasks. 

The  work  sample  tests  required  the  incumbents  to  perform  the  tasks  while  being  observed  by  trained  test 
administrators.  These  extensively  trained  test  administrators  were  provided  information  from  SMEs  concerning 
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task  objective,  task  time  limits,  and  tools  and  equipment  required  to  perform  the  task.  The  examinees  were 
allowed  access  to  routinely  used  technical  documents  and  were  instructed  to  perform  each  task  according  to 
normal  procedures.  If  they  asked  why  they  were  being  timed,  examinees  were  told  that  their  task  times  were 
being  collected  only  for  administrative  purposes  (i.e.,  in  order  to  finish  all  the  scheduled  testing  in  the  allotted 
time).  For  scoring  purposes,  each  task  was  divided  into  the  steps  necessary  for  successful  task  completion.  While 
an  examinee  performed  a  task,  the  test  administrator  marked  on  a  checklist  whether  or  not  each  step  was 
correctly  performed.  If  a  subject  reached  the  maximum  time  limit  for  a  task,  all  uncompleted  steps  were  recorded 
as  incorrect.  The  16  tasks  ranged  from  having  seven  steps  for  a  task,  "Research  technical  orders  for  AGE  chassis, 
enclosure,  and  drive  maintenance  information"  to  26  steps,  "Perform  a  gas  turbine  compressor  inspection."  Four 
task-level  experience  measures  were  used:  1)  weeks  since  last  performed;  2)  number  of  times  performed;  3) 
average  times  performed  per  month  (calculated  by  dividing  number  of  times  performed  by  job  experience);  and 
4)  task  experience  ratings  using  a  5-point  scale.  Of  the  five  variables  mentioned  above,  only  job  experience  was 
not  self-reported. 

The  Armed  Forces  Qualification  Test  (AFQT)  was  used  as  the  aptitude  measure  and  is  comprised  of  four  subtests 
of  the  ASVAB.  The  ASVAB  has  10  ability  subtests  (M  =  50,  SD  =  10)  with  a  range  from  20  to  80  points.  The 
sum  of  two  of  these  scores  (Arithmetic  Reasoning  and  Mathematical  Knowledge)  plus  twice  the  Verbal  score  (a 
sum  of  Word  Knowledge  and  Paragraph  Comprehension)  comprises  the  AFQT. 

Average  task  completion  times  and  Ns  were  determined  for  three  data  sets  for  each  of  the  16  tasks  —  members 
getting  all,  some,  or  none  of  the  task’s  steps  correct.  Independent  SME  estimates  of  average  successful  task 
completion  times  for  all  first-termers  were  compared  against  the  actual  average  times.  Differences  among  the 
three  data  sets  in  the  four  task  experience  measures,  AFQT  scores,  and  technical  school  final  grades  are  reported. 

Results 

As  reported  in  Table  1,  for  all  16  tasks  the  largest  numbers  of  personnel  were  associated  with  getting  some  steps 
correct.  Less-difficult  tasks  were  associated  with  larger  numbers  of  personnel  getting  all  steps  correct  (r  =  -.567, 

P  =  .022).  Across  the  16  tasks,  the  average  percentage  of  personnel  with  all  steps  correct  was  7.5%,  some  steps 
correct  85.2%,  and  no  steps  correct  7.3%.  For  one  task,  "Remove  or  install  hydraulic  lines  or  fittings,"  all 
personnel  got  only  some  of  the  steps  correct.  This  task  had  only  eight  steps,  a  mid-level  task  difficulty  of  4.86, 
and  mid-level  time  limit  of  15  minutes.  For  an  additional  task,  "Perform  gas  turbine  compressor  inspections,"  no 
personnel  got  all  steps  correct.  This  was  the  second  most  difficult  task  and  had  26  steps.  It  also  had  the  longest 
time  limit  (45  minutes). 

A  correlation  of  .790  (g  =  .0008)  was  found  between  the  average  times  for  all  steps  correct  (N=261)  and 
independent  SME  estimates  of  the  average  time  required  by  all  first-termers  for  successful  task  completion. 

Table  1.  Summary  task  information  using  three  data  sets  for  each  task 
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(Task  difficulty)  Task 
description 

(Task  time  limit  in  min.) 

All  steps  correct 

Some  steps 
correct 

No  steps  correct 

Avg.  time  in  min. 
(N) 

Avg.  time  in  min. 
(N) 

Avg.  time  in 
min. 

(N) 

(5.08)  Perform  AGE  electrical 

12.8 

19.5 

checks (20) 

(16) 

1 

(40) 

(4.49)  Perform  load  bank 

8 

6.8 

9 

service  inspections  (12) 

(2) 

(233) 

(3) 

(6.47)  Adjust  turbine  engine 

17 

16.6 

19.9 

fuel  system  components  (20) 

(5) 

(133) 

(123) 

(5.31)  Measure  resistance  of 

11.8 

9.9 

20 

AGE  electrical  components  (20) 

(4) 

(252) 

(3) 

(5.13)  Perform  generator 

8.8 

N/A 

service  inspections  (15) 

(33) 

(0) 

(4.59)  Research  tech  orders, 

6.9 

6.3 

10.5 

charts,  or  diagrams  (12) 

(19) 

(235) 

(6) 

(4.06)  Splice  electrical  system 

14.7 

15.0 

25 

wiring  (25) 

(31) 

(229) 

(1) 

(4.09)  Remove  or  install  fuel 

5.6 

5.4 

N/A 

lines  or  fittings  (12) 

(30) 

(231) 

(0) 

(4.76)  Clean  motor  or  generator 

8.7 

9.1 

14.9 

armature  (15) 

(32) 

(191) 

(36) 

(6.03)  Isolate  engine,  motor,  or 

22.5 

21.1 

35 

generator  malfunctions  (35) 

(2) 

(258) 

(1) 

(6.25)  Perform  gas  turbine 

N/A 

39.2 

45 

compressor  inspections  (45) 

(0) 

(237) 

(24) 

(5.74)  Perform  hydraulic  test 

9.5 

10.4 

N/A 

stand  service  inspections  (15) 

(22) 

(239) 

(0) 

(4.86)  Remove  or  install 

N/A 

6.4 

N/A 

hydraulic  lines  or  fittings  (15) 

(0) 

(261) 

(0) 

(3.59)  Remove  and  replace 

8.1 

7.5 

2.5 

engine  fan  belts  (25) 

(98) 

(161) 

(2) 

(5.83)  Isolate  pneumatic  system 

8.8 

17.1 

24.8 

malfunctions  (25) 

(6) 

(187) 

(65) 

(3.71)  Inspect  vehicles  for 

9.1 

9.6 

N/A 

safety  of  operations  (12) 

(9) 

(252) 

(0) 

Figure  1.  Subject  matter  expert  estimates  of  average  task  times  required  by  all  first-termers  for  successful  task 
completion  versus  average  times  for  all  steps  correct  (N=26l)  presents  a  scatterplot  for  these  data  points.  The 
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most  variation  between  the  estimated  and  actual  averages  was  for  the  tasks  estimated  by  the  experts  as  requiring 
15  minutes. 


Numerous  small  cell  sizes  makes  analyses  of  variance  problematic  for  the  three  data  sets  for  each  task.  All  steps 
correct  and  some  steps  correct  tended  to  be  close  in  average  task  performance  time,  as  well  as  having  smaller 
average  times  than  no  steps  correct.  For  10  tasks,  the  longest  task  times  were  associated  with  no  steps  correct.  As 
reported  in  Table  2,  using  tasks  in  the  same  order  as  in  Table  1,  members  with  all  steps  correct  usually  had  the 
most  recent  task  experience,  most  overall  task  experience,  most  average  task  performance  per  month,  highest 
task  experience  ratings,  highest  Armed  Forces  Qualification  Test  (AFQT)  scores,  and  highest  grades  in  technical 
school.  On  the  average,  those  examinees  who  were  correct  on  all  steps  for  the  first  task  in  the  table  had 
performed  it  5.6  months  ago,  those  correct  on  no  steps  had  performed  it  15.6  months  ago,  while  those  correct  on 
some  steps  had  performed  it  16.1  months  ago.  Therefore  those  examinees  with  most  recent  average  task 
experience,  the  smallest  average  time  since  last  performing  this  task,  had  all  steps  correct.  Table  2  reflects  this  by 
having  El,  the  symbol  used  for  the  most  recent  mean  task  experience,  in  the  column  for  all  steps  correct  for  the 
first  task.  For  seven  tasks,  members  with  all  steps  correct  had  the  most  recent  task  experience.  For  nine  tasks, 
members  with  all  steps  correct  had  the  most  overall  task  experience.  For  eight  tasks,  members  with  all  steps 
correct  had  the  most  task  experience  per  month.  For  1 1  tasks,  members  with  all  steps  correct  had  the  highest  task 
experience  ratings.  For  1 1  tasks,  members  with  all  steps  correct  had  the  highest  AFQT  scores.  For  13  tasks, 
members  with  all  steps  correct  had  the  highest  final  technical  school  grades.  Members  with  no  steps  correct 
tended  to  be  at  the  other  extreme  on  these  six  measures,  while  members  with  some  steps  correct  tended  to  be  in 
the  middle. 

Table  2.  Partial  rank  orderings  for  means  of  four  experience  measures,  AFQT,  and  final  school  grade 

within  each  task 
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All  steps  correct 

Some  steps  correct 

No  steps  correct 

El,  E4,  AFQT,  FSG 

E2,E3 

4.49 

E2,  E3,  E4,  FSG 

El,  AFQT 

m 

AFQT,  FSG 

E2,  E3,  E4 

El 

5.31 

E2,  E3,  E4,  AFQT,  FSG 

El 

5.13 

El,  E2,  E3,E4,  AFQT, 
FSG 

N/A 

E||| 

E2,  E3,  E4,  AFQT,  FSG 

El 

4.06 

El,  E2,  E3,  E4,  AFQT 

FSG 

4.09 

AFQT,  FSG 

El,  E2,  E3,E4 

N/A 

4.76 

El,  E2,  E3,  E4,  AFQT, 
FSG 

6.03 

El,  E4,  FSG 

E2,E3 

AFQT 

6.25 

N/A 

E1,E2,E3,E4, 

AFQT 

FSG 

El,  E2,  E3,E4,  AFQT, 
FSG 

N/A 

N/A 

N/A 

N/A 

E2,  FSG 

El,  E3 

AFQT 

5.83 

El,  E4,  AFQT,  FSG 

E2,E3 

3.71 

E2,  E3,  E4,  AFQT,  FSG 

El 

N/A 

Note.  TD  =  task  difficulty  for  the  task;  El  =  most  recent  mean  task  experience;  E2  =  most  overall  mean  task 
experience;  E3  =  most  mean  task  experience  per  month;  E4  =  highest  mean  task  experience  ratings;  AFQT  = 
highest  mean  AFQT  score;  FSG  =  highest  mean  final  school  grade.  Experience  ratings  were  not  collected  for  the 
task  with  a  TD  of  3.59. 

Discussion 

The  most  interesting  result  was  the  surprisingly  large  .79  correlation  between  SME  estimates  of  the  average 
times  required  for  successful  task  performance  by  first-termers  and  the  results  from  the  study  for  all  steps 
correct.  This  indicates  that  SME  time  estimates  may  be  useful  for  job/AFS  restructuring.  For  such  purposes,  the 
SMEs  would  need  to  agree  on  the  steps  and  equipment  involved  in  performing  a  task,  as  they  did  during 
workshops  conducted  for  this  study  prior  to  data  collection. 

Data  collected  for  the  JPM  project  cost  approximately  one  million  dollars  for  each  AFS,  or  $5000  for  each 
airman  in  the  sample.  It  is  doubtful  that  anything  similar  to  this  massive  amount  of  carefully  conceptualized  work 
sample  data  will  be  collected  and  available  for  use  in  the  near  future.  Therefore,  it  is  important  that  the  lessons 
learned  from  this  project  be  applied  to  all  relevant,  smaller  work  sample  data-collection  efforts.  These  lessons 
include  careful  standardization  of  tasks,  having  the  instructions  to  the  examinees  always  stating  why  they  are 
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being  timed,  timing  the  steps  in  addition  to  the  complete  tasks,  and  perhaps  even  videotaping  task  performance  if 
Privacy  Act  concerns  can  be  addressed.  Having  times  at  the  step  level  in  addition  to  the  task  level  could  have 
important  implications  for  the  training  communities,  in  addition  to  the  manpower  and  classification  communities. 
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